Skip to content

Conversation

@scottmcm
Copy link
Member

@scottmcm scottmcm commented Feb 9, 2025

Inspired by this zulip conversation: https://rust-lang.zulipchat.com/#narrow/channel/189540-t-compiler.2Fwg-mir-opt/topic/Feedback.20on.20a.20MIR.20optimization.20idea/near/498579990

Draft for now because it needs #136735 to get the codegen tests to pass.

@rustbot
Copy link
Collaborator

rustbot commented Feb 9, 2025

r? @joboet

rustbot has assigned @joboet.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 9, 2025
@scottmcm
Copy link
Member Author

scottmcm commented Feb 9, 2025

Let's see whether it actually improves things:
@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 9, 2025
@bors
Copy link
Collaborator

bors commented Feb 9, 2025

⌛ Trying commit f7970b3 with merge 30df00c...

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 9, 2025
Simplify `slice::Iter::next` enough that it inlines

Inspired by this zulip conversation: <https://rust-lang.zulipchat.com/#narrow/channel/189540-t-compiler.2Fwg-mir-opt/topic/Feedback.20on.20a.20MIR.20optimization.20idea/near/498579990>

Draft for now because it needs rust-lang#136735 to get the codegen tests to pass.
@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Collaborator

bors commented Feb 9, 2025

☀️ Try build successful - checks-actions
Build commit: 30df00c (30df00cd8218095deb80cc6b913de02f5ae4a5b0)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (30df00c): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.4% [0.4%, 0.4%] 1
Regressions ❌
(secondary)
9.4% [9.4%, 9.4%] 1
Improvements ✅
(primary)
-0.6% [-2.2%, -0.1%] 210
Improvements ✅
(secondary)
-0.5% [-1.1%, -0.1%] 131
All ❌✅ (primary) -0.6% [-2.2%, 0.4%] 211

Max RSS (memory usage)

Results (primary -2.2%, secondary -2.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.4% [1.7%, 2.8%] 3
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-4.4% [-8.5%, -2.4%] 6
Improvements ✅
(secondary)
-2.5% [-2.5%, -2.5%] 1
All ❌✅ (primary) -2.2% [-8.5%, 2.8%] 9

Cycles

Results (primary -1.0%, secondary 1.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
4.6% [4.6%, 4.6%] 1
Improvements ✅
(primary)
-1.0% [-1.1%, -0.9%] 2
Improvements ✅
(secondary)
-1.1% [-1.1%, -1.1%] 1
All ❌✅ (primary) -1.0% [-1.1%, -0.9%] 2

Binary size

Results (primary 0.1%, secondary -0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.0%, 5.0%] 28
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.2% [-0.5%, -0.0%] 45
Improvements ✅
(secondary)
-0.1% [-1.3%, -0.0%] 48
All ❌✅ (primary) 0.1% [-0.5%, 5.0%] 73

Bootstrap: 780.488s -> 778.559s (-0.25%)
Artifact size: 329.04 MiB -> 329.26 MiB (0.06%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Feb 9, 2025
@scottmcm scottmcm force-pushed the poke-slice-iter-next branch from f7970b3 to 85deb4d Compare February 11, 2025 09:51
@scottmcm scottmcm removed the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Feb 11, 2025
pub fn slice_iter_next<'a>(it: &mut std::slice::Iter<'a, u32>) -> Option<&'a u32> {
// CHECK: %[[ENDP:.+]] = getelementptr inbounds{{( nuw)?}} i8, ptr %it, {{i32 4|i64 8}}
// CHECK: %[[END:.+]] = load ptr, ptr %[[ENDP]]
// CHECK: %[[START:.+]] = load ptr, ptr %it,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased atop the transmute-gives-asserts change; codegen tests should be passing now with just this trivial change that it's loading the start pointer first instead of the end pointer first.

let mut _0: ();
let mut _11: std::slice::Iter<'_, T>;
let mut _12: std::iter::Enumerate<std::slice::Iter<'_, T>>;
let mut _13: std::iter::Enumerate<std::slice::Iter<'_, T>>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see that the Enumerate iterators get completely SRoAed even just in MIR, with this!

@scottmcm
Copy link
Member Author

Just to check that having the assumes in the LLVM-IR doesn't somehow lose all of the gains:
@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 11, 2025
@bors
Copy link
Collaborator

bors commented Feb 11, 2025

⌛ Trying commit 85deb4d with merge eaf73cd...

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 11, 2025
Simplify `slice::Iter::next` enough that it inlines

Inspired by this zulip conversation: <https://rust-lang.zulipchat.com/#narrow/channel/189540-t-compiler.2Fwg-mir-opt/topic/Feedback.20on.20a.20MIR.20optimization.20idea/near/498579990>

Draft for now because it needs rust-lang#136735 to get the codegen tests to pass.
@bors
Copy link
Collaborator

bors commented Feb 11, 2025

☀️ Try build successful - checks-actions
Build commit: eaf73cd (eaf73cd9c4fc608d616056b232ccb9cbb8df5679)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (eaf73cd): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.4% [0.4%, 0.5%] 2
Regressions ❌
(secondary)
9.3% [9.3%, 9.3%] 1
Improvements ✅
(primary)
-0.6% [-2.2%, -0.1%] 196
Improvements ✅
(secondary)
-0.5% [-2.4%, -0.1%] 101
All ❌✅ (primary) -0.6% [-2.2%, 0.5%] 198

Max RSS (memory usage)

Results (primary -1.8%, secondary -3.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.8% [2.1%, 3.3%] 3
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.9% [-8.7%, -0.7%] 13
Improvements ✅
(secondary)
-3.2% [-7.1%, -2.0%] 29
All ❌✅ (primary) -1.8% [-8.7%, 3.3%] 16

Cycles

Results (primary -1.4%, secondary 2.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.0% [2.1%, 5.4%] 5
Improvements ✅
(primary)
-1.4% [-1.8%, -1.0%] 3
Improvements ✅
(secondary)
-2.3% [-2.3%, -2.3%] 1
All ❌✅ (primary) -1.4% [-1.8%, -1.0%] 3

Binary size

Results (primary 0.0%, secondary -0.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.0%, 3.9%] 38
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.3% [-1.8%, -0.1%] 50
Improvements ✅
(secondary)
-0.2% [-2.0%, -0.0%] 83
All ❌✅ (primary) 0.0% [-1.8%, 3.9%] 88

Bootstrap: 785.339s -> 785.217s (-0.02%)
Artifact size: 348.32 MiB -> 348.38 MiB (0.02%)

@scottmcm scottmcm reopened this Feb 14, 2025
@scottmcm scottmcm marked this pull request as ready for review February 14, 2025 18:40
@scottmcm
Copy link
Member Author

scottmcm commented Feb 14, 2025

With #136735 having landed, this is good for a review now.

@rustbot ready

The one large regression is secondary, and looks to be one of the codegen units just having way more to do in LLVM:
image

// safe since we check if the iterator is empty first.
let ptr = self.ptr;
let end_or_len = self.end_or_len;
// SAFETY: Type invariants.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safety comment is a bit too sloppy imo. At least it should say something like "same as above" if you want to avoid repetition. Or maybe split it into two unsafe blocks, one for each arm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, true.

Weirdly when I added tighter-scoped unsafe blocks it stopped inlining (and I even rebuilt to check because that's so strange), but I added more specific comments inside a bigger block.

…ffset`

Probably reasonable anyway since it more obviously drops provenance.
This adds a few more statements to `next`, but optimizes better in the loops (saving 2 blocks in `forward_loop`, for example)
@scottmcm scottmcm force-pushed the poke-slice-iter-next branch from 4ddbbbd to 7add358 Compare February 15, 2025 07:03
Copy link
Member

@joboet joboet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, and the benchmarks speak for themselves – I think we can justify the one regression.

I have one silly nit, which you are welcome to ignore. Otherwise, r=me

@scottmcm
Copy link
Member Author

@bors r=joboet

@bors
Copy link
Collaborator

bors commented Feb 20, 2025

📌 Commit 7add358 has been approved by joboet

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 20, 2025
@bors
Copy link
Collaborator

bors commented Feb 20, 2025

⌛ Testing commit 7add358 with merge f04bbc6...

@bors
Copy link
Collaborator

bors commented Feb 20, 2025

☀️ Test successful - checks-actions
Approved by: joboet
Pushing f04bbc6 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 20, 2025
@bors bors merged commit f04bbc6 into rust-lang:master Feb 20, 2025
7 checks passed
@rustbot rustbot added this to the 1.87.0 milestone Feb 20, 2025
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (f04bbc6): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Our benchmarks found a performance regression caused by this PR.
This might be an actual regression, but it can also be just noise.

Next Steps:

  • If the regression was expected or you think it can be justified,
    please write a comment with sufficient written justification, and add
    @rustbot label: +perf-regression-triaged to it, to mark the regression as triaged.
  • If you think that you know of a way to resolve the regression, try to create
    a new PR with a fix for the regression.
  • If you do not understand the regression or you think that it is just noise,
    you can ask the @rust-lang/wg-compiler-performance working group for help (members of this group
    were already notified of this PR).

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.4% [0.4%, 0.4%] 2
Regressions ❌
(secondary)
8.8% [8.8%, 8.8%] 1
Improvements ✅
(primary)
-0.4% [-1.4%, -0.1%] 123
Improvements ✅
(secondary)
-0.5% [-2.3%, -0.1%] 70
All ❌✅ (primary) -0.4% [-1.4%, 0.4%] 125

Max RSS (memory usage)

Results (primary -2.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
7.3% [5.5%, 9.0%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-5.8% [-9.1%, -1.8%] 5
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -2.1% [-9.1%, 9.0%] 7

Cycles

Results (primary -1.2%, secondary 0.9%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.8% [3.8%, 3.8%] 1
Improvements ✅
(primary)
-1.2% [-2.3%, -0.8%] 8
Improvements ✅
(secondary)
-2.0% [-2.0%, -2.0%] 1
All ❌✅ (primary) -1.2% [-2.3%, -0.8%] 8

Binary size

Results (primary -0.1%, secondary -0.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.3% [0.0%, 0.8%] 10
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.2% [-0.4%, -0.0%] 58
Improvements ✅
(secondary)
-0.3% [-1.6%, -0.0%] 11
All ❌✅ (primary) -0.1% [-0.4%, 0.8%] 68

Bootstrap: 774.026s -> 774.477s (0.06%)
Artifact size: 360.27 MiB -> 361.04 MiB (0.21%)

@scottmcm scottmcm deleted the poke-slice-iter-next branch February 21, 2025 05:48
@rylev
Copy link
Member

rylev commented Feb 25, 2025

Perf improvements vastly outweigh the regressions

@rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants